Data visualization

https://xkcd.com/688/

Data visualization

  • What is data visualization?
    • A fancy buzzword for making diagrams?
    • Methods for the visual presentation of quantitative information?
    • The intersection of art and science?
  • Important skills for science communication!

History

Minard’s (1896) diagram of the French invasion of Russia

John Snow’s (1854) Cholera map

Categories of visuals

  • Claus Wilke groups visualization methods into the following categories:
    • Amounts - visualizing counts or totals
    • Distributions
    • Proportions
    • x-y relationships (Associations)
    • Geospatial data
    • Uncertainty

Amounts: barplots

  • A barplot is a visualization of a quantitative measure partitioned by a categorical variable.
frogs <- read.csv("../data/frogs.csv")
table(frogs$Species.Code)

AMTO BULL CHFR GRFR NLFR SPPE 
 110   31    4  154  151  343 
par(mar=c(5,5,1,1), mfrow=c(1,2))
barplot(table(frogs$Species.Code), cex.names=0.8)
barplot(table(frogs$Species.Code), cex.names=0.8, horiz=T, las=1)

Grouped barplots

barplot(table(frogs$Pelee.Station, frogs$Species.Code), beside=T, xlab='Species code')

Stacked barplots

barplot(table(frogs$Species.Code, frogs$Pelee.Station), legend=T, args.legend=list(x=17, y=110, bty='n', horiz=T), xlab='Monitoring station', ylab='Recorded calls/day')

Bad barplots

par(mfrow=c(1,2), mar=c(4,5,2,1)); tab <- table(frogs$Species.Code)
barplot(tab[tab>100], ylim=c(100, max(tab)), xpd=FALSE, main="y-axis above 0")
barplot(log10(table(frogs$Species.Code, frogs$Pelee.Station)+1), main="Log-transformed stacked barplot")

Distributions - univariate

par(mfrow=c(1,2))
x <- frogs$Air.Temperature[!is.na(frogs$Air.Temperature)]
hist(x); rug(x); plot(density(x))

Distributions - univariate by factor

par(mfrow=c(1,2))
boxplot(split(iris$Sepal.Length, iris$Species), ylab='Sepal length')
sm(require(vioplot)); vioplot(split(iris$Sepal.Length, iris$Species))

Distributions - univariate by factor (2)

par(mar=c(5,7,1,1), mfrow=c(1,2))
sm(require(ggfree)); pal <- gg.rainbow(3, c=50)
x <- split(iris$Sepal.Length, iris$Species); ridgeplot(x, col=pal, fill=add.alpha(pal, 0.3), bty='n', xlab='Sepal length')
sm(require(beeswarm)); beeswarm(x)

Distributions - bivariate continuous

temp <- frogs[!is.na(frogs$Air.Temp) & !is.na(frogs$Cloud.Cover), ]
par(mfrow=c(1,3)); plot(temp$Air.Temp, temp$Cloud.Cover)
sm(require(MASS)); k <- kde2d(temp$Air.Temp, temp$Cloud.Cover)
image(k); contour(k)

Distributions - bivariate in 3D

require(rgl)
persp3d(k, lit=F, front='lines', back='lines')

You must enable Javascript to view this page properly.

Proportions - pies and donuts

par(mfrow=c(1,2), mar=c(1,1,1,1))
pie(table(frogs$Species))
ringplot(table(frogs$Species), r0=0.3, r1=0.6, use.names=T, offset=0.1)

Proportions - stream plots

par(mar=c(3,1,1,1)); n <- ncol(cbc); pal <- gg.rainbow(n)
stackplot(cbc[2:n], x=cbc[,1], type='w', bty='n', yaxt='n', spline=T, border='white', lwd=0.25, col=pal)
labels <- gsub('\\.', ' ', names(cbc)[2:n])
legend(x=1959, y=-130, legend=rev(labels), bty='n', cex=0.7, fill=rev(pal), y.intersp=0.8)

Proportions - contingency tables

tab <- table(frogs$Calling.Code, toupper(frogs$Precipitation))
plot(tab, shade=TRUE, las=1, main=NA, xlab='Calling code')  # mosaicplot

Associations - xy-plots

par(mfrow=c(1,2))
plot(cars, xlab='Speed', ylab='Stopping distance')
sm(require(corrplot)); corrplot(cor(mtcars))

Associations - repeated measures

slopegraph(co2.emissions)

Aesthetics

  • Where data visualization intersects art!
  • Aesthetics are not entirely subjective – or at least, there is some consistency.
  • Some components of aesthetics according to Wilke:
    • position
    • shape
    • size
    • colour
    • line width, type

Colour palettes

par(mfrow=c(2,3), mar=c(2,2,1,1))
barplot(tab, col=terrain.colors(6), main='Base R terrain colors'); barplot(tab, col=piratepal('pony'), main='yarrr piratepal pony'); barplot(tab, col=wes_palette("Zissou1", 6, type = "continuous"), main='wesanderson Zizzou1')
barplot(tab, col=brewer.pal(6, 'Set2'), main='RColorBrewer Set2'); barplot(tab, col=viridis(6), main='Viridis default'); barplot(tab, col=magma(6), main='Viridis magma')

Accessibility

  • Visual impairment
    • Small labels are bad (the ggplot2 default settings are awful)
    • Use a substantial contrast between foreground and background
    • Colours and shapes that are too similar are difficult to distinguish
  • Try to provide alternate text for images you are distributing online
  • Roughly 8% of men and 0.4% of women have some form of colour blindness.
    • Try to use colour palettes that accommodate the colour blind.

Alternate text

plot(rnorm(100), rnorm(100))

This is how you add alternate text in RMarkdown

Colour blindness

require(dichromat)
pal <- rainbow(6); par(mfrow=c(1,2), mar=c(5,5,1,1))
barplot(tab, col=pal); barplot(tab, col=dichromat(pal, type='tritan'))

Further reading